Assignment of position-specific error probability to primary DNA sequence data.

نویسندگان

  • C B Lawrence
  • V V Solovyev
چکیده

DNA sequence predicted from polyacrylamide gel-based technologies is inaccurate because of variations in the quality of the primary data due to limitations of the technology, and to sequence-specific variations due to nucleotide interactions within the DNA molecule and with the gel. The ability to recognize the probability of error in the primary data will be useful in reconstructing the target sequence of a DNA sequencing project, and in estimating the accuracy of the final sequence. This paper describes the use of linear discriminant analysis to assign position-specific probabilities of incorrect, over- and under-prediction of nucleotides for each predicted nucleotide position in primary sequence data generated by a gel-based DNA sequencing technology. Using this method, most of the error potential in primary sequence data can be assigned to a limited number of discrete positions. The use of probability values in the sequence reconstruction process, and in estimating the accuracy of consensus sequence determination is described.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Nucleosome Positioning Using Multiple Evidence Tracks

We describe a probabilistic model, implemented as a dynamic Bayesian network, that can be used to predict nucleosome positioning along a chromosome based on one or more genomic input tracks containing position-specific information (evidence). Previous models have either made predictions based on primary DNA sequence alone, or have been used to infer nucleosome positions from experimental data. ...

متن کامل

Parallelizing Assignment Problem with DNA Strands

Background:Many problems of combinatorial optimization, which are solvable only in exponential time, are known to be Non-Deterministic Polynomial hard (NP-hard). With the advent of parallel machines, new opportunities have been emerged to develop the effective solutions for NP-hard problems. However, solving these problems in polynomial time needs massive parallel machines and ...

متن کامل

Estimation of errors in "raw" DNA sequences: a validation study.

As DNA sequencing is performed more and more in a mass-production-like manner, efficient quality control measures become increasingly important for process control, but so also does the ability to compare different methods and projects. One of the fundamental quality measures in sequencing projects is the position-specific error probability at all bases in each individual sequence. Accurate pre...

متن کامل

Taxonomic Position of Iranian Isolates of Eretmocerus mundus (Merect), a Parasitoid of Bemisia tabaci (Gennadius)

Bemisia tabaci (Gennadius) (Hemiptera: Aleyrodidae), is one of the most important pest of vegetable and fruit crops. This polyphagous pest has a range of natural enemies including the parasitoid Eretmocerus mundus (Merect) (Hymenoptera: Aphelinidae). To determine the molecular profile and taxonomic status of Iranian isolates of E. mundus, parasitized B. tabaci samples were collected from cotton...

متن کامل

A Novel Protection Guaranteed, Quality of Transmission Aware Routing and Wavelength Assignment Algorithm for All-optical Networks

Transparent All Optical Networks carry huge traffic and any link failure can cause the loss of gigabits of data; hence protection and its guarantee becomes necessary at the time of failure. Many protection schemes were presented in the literature, but none of them speaks about protection guarantee. Also, in all optical networks, due to absence of  regeneration capabilities, the physical layer i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Nucleic acids research

دوره 22 7  شماره 

صفحات  -

تاریخ انتشار 1994